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Abstract — A framework is proposed that allows for a joint 
description and optimization of both binary polar coding and 
the multilevel coding (MLC) approach for 2 m -ary digital pulse- 
amplitude modulation (PAM). The conceptual equivalence of 
polar coding and multilevel coding is pointed out in detail. Based 
on a novel characterization of the channel polarization phe- 
nomenon, rules for the optimal choice of the bit labeling in this 
coded modulation scheme employing polar codes are developed. 
Simulation results for the AWGN channel are included. 

I. Introduction 

Polar codes [1 1 are known as a low-complexity binary coding 
scheme that provably approaches the capacity of arbitrary sym- 
metric binary-input discrete memoryless channels (B-DMCs). 
The generalization to q-ary channels (q > 2) has been the 
subject of various works, cf., e.g. J3]. Though, the topic 
of polar-coded modulation, i.e., the combination of 2 m -ary 
digital PAM modulation and binary polar codes for increased 
spectral efficiency, has hardly been addressed so far. In J3], a 
transmission scheme for polar codes with bit-interleaved coded 
modulation (BICM) [4] has been proposed, focussing on the 
interleaver design. 

In this paper, we consider the multilevel coding (MLC) 
construction [5|, |6| for memoryless channels like the AWGN 
channel (no fading). 

It has been observed (e.g., Q) that the MLC approach 
is closely related to that of polar coding on a conceptual 
level. Based on these similarities, we propose a framework 
that allows us to completely describe both polar coding and 
, 2 m -ary modulation in a unified context as certain channel 
transforms. This unified description enables us to design 
optimized constellation-dependent coding schemes for MLC. 

The paper is organized as follows: In Sec. [H] the framework 
, for a joint description of polar coding and 2 m -ary PAM 
modulation is developed. This framework is then used for 
describing the polar coding construction in Sec. [HI] leading 
to a novel interpretation of the polarization phenomenon. The 
optimum combination of binary polar coding and multilevel 
coding is discussed in Sec. IIV1 followed by simulation results 
for the AWGN channel in Sec. [V] 

II. Channel Transforms 

A. Sequential Binary Partitions 

Let W : X — > y be a discrete, memoryless channel (DMC) 
with input symbols x G X (alphabet size \X\= 2 k ), output 
symbols y E y from an arbitrary alphabet y, and mutual 



information I{X; We define an order-fc sequential binary 
partition (/c-SBP) <p of W to be a channel transform 

\N^{B$\...,Bl k -^} (1) 



that maps W to an ordered set of k binary-input DMCs (B- 
DMCs) which we will refer to as bit channels. For any given 
W, such a fc-SBP is characterized by a binary labeling rule C v 
that maps binary /c-tuples bijectively to the 2 fc input symbols 

x G X: 

C v : [b , &i, . . . , bk-i] G {0, l} fc >-> x e X . (2) 

The number of possible labelings equals (2 fe !). 

Each bit channel Bp (0 < i < k) of a fc-SBP is supposed 
to have knowledge of the output of W as well as of the 
values transmitted over the bit channels of smaller indices 



B 



(0) 



B 



(i-i) 



Thus, we have 



B« : {0, 1} y x {0, iy , 0<t<k 



(3) 



The mutual information between channel input and output of 
Bp assuming equiprobable input symbols is therefore given 



by 



J(B«) :=/(£,•; y|Bo,---,A-i 



(4) 



which we will refer to as the (symmetric) bit channel capacity 
of B^. (If W is a symmetric channel, this value in fact 
equals the channel capacity.) The mutual information of W 
is preserved under the transform ip, i.e., 

fc-i 

^/(BW)=7(X;F) (5) 

i=Q 

which directly follows from the well-known chain rule of 
mutual information. 

Considering polar-coded modulation, we show that the code 
construction can be described by SBPs. We are particularly in- 
terested in two properties of SBPs, namely the mean value and 
the variance of the bit channel capacities, defined respectively 



M„(W) := \Y j mf) = \l{X;Y) 



(6) 



i=0 

fc-i 



^( w ) : =rE J ( B i 4) ) 2 - M ^ w ) 2 - ( ? ) 



k 

Clearly, from (0 the mean value M V (\N) in fact depends only 
on the channel W, rather than on the transform ip. It represents 

'A short remark on the notation: Channels are denoted by sans serif fonts, 
capital roman letters stand for random variables while boldfaced symbols 
denote vectors or matrices. 
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the average (symmetric) capacity of W per transmitted binary 
symbol. 

The variance of an SBP ip is upper-bounded by 

V V (W) < M V (W)(1 - M V (W)) (8) 

with equality only iff all J(B^) are either or 1. This follows 
from 

k-i 

M W ) = fc E J ( B i 4) ) 2 -M V (W) 2 (9) 

i=0 
fc-1 

<-£/(B«)-M v (W) 2 

i=0 

= Af v (W)(l - M V (W)) . 

and < I(B$) < 1 for all < i < k. Note that this upper 
bound does not depend on the particular labeling L v but only 
on the channel W. 

An important subset of fc-SBPs is formed by those trans- 
forms whose labeling rules are described by binary bijective 
linear mappings. Let W = (Bo x . . . x Bfc_i) be a vector 
channel of fc independent arbitrary B-DMCs Bo, . . . , &k-i- 
Then, we call the fc-SBP 



such that 



(B x ... x B fc _i) -> {B<°>, . . . , B^" 1 )} (10) 



a linear k-SBP if its labeling rule is given by 
: b e F5 h+ b ■ € F$ . 



(ID 



with b := [bo, bi, . . . , 6fe_i] and A v being an invertible binary 
(fc, k) matrix. Clearly, the number of possible linear fc-SBPs 
equals the number of non-singular binary (fc, fc) matrices and 
is significantly smaller than that of general fc-SBPs. 

B. Product Concatenation of SBPs 

Under certain conditions, it is possible to concatenate two 
(or more) SBPs in a product form. Let 

<p: W^IBW...^- 1 )} (12) 
be an arbitrary fci-SBP and 



rf>: (B x ... x B fc2 _i) -4 {B 



B 



'} (13) 



a fc2-SBP that takes a vector channel of fc 2 independent B- 
DMCs Bo, ... , Bfc 2 _i as an input. Each of the vector channels 
(B^ ) fc2 - obtained by taking fc2 independent instances of B^' 
- can be partitioned by ip. Thus, (p and ip may be concatenated 
by considering the vector channel W fe2 , leading to a product 
SBP of order fcifc 2 : 



(^V:W fc2 ^{B^,...,^ 



B ( ^ fe , 2_1) } • (14) 



Here, the bit channels of ip (g> if) are given by 



B 



(k 2 i+j) 



{0,1} -^y k2 x{o,i} 



k 2 i+j 



(15) 



with symmetric capacities 

/(B^; +i) ) = J(s fei+i; y , 



, 5^2-1 l-^o j ■ • ■ j Bk 2 i+j-i) 
(16) 



k 2 -l 



(k 2 i+j) 



) = W) 



(17) 



for all < « < fci and < j < k^. We remark that the product 
transform ip ® ip is completely determined in a unique way by 
the individual SBPs cp and ?/) since their bit channels imply a 
fixed order. 

The product concatenation of SBPs does not influence the 
mean value of the bit channel capacities 

M^(W fe2 ) = A/„(W) (18) 



due to the chain rule of mutual information. However, the 
variance of the bit channel capacities increases. It is given 
by the sum of the variance of the first transform and the 
averaged variance of the second transform around the bit 
channel capacities of the first one: 

fci-i 

W(W fc2 ) = ^(W) + - J2 M^) ■ d9) 

1 j=0 

If <p and ip are linear SBPs with labeling rules specified by 
A v and A^, respectively, then their product <p (8) i\) is again a 
linear fcifc2-SBP with labeling rule 



C 



b € 



pfei fe 2 



^b P 



ki,k 2 



(Ay ® Ap) . 



(20) 



Here, A$ ® A v denotes the Kronecker product of A^ and 
Ay Pk 1: k 2 I s tne (^1^2, fcife) permutation matrix that maps 
the (k-ii + j)-th component of the vector b to position i + k\j 
(for all < i < h, < j < fc 2 ). 

III. Polar Codes 

Polar codes, as introduced by Ankan Jl], have been shown 
to be a channel coding construction that provably achieves the 
symmetric capacity of arbitrary binary-input discrete memory- 
less channels (B-DMCs) under low-complexity encoding and 
successive cancellation (SC) decoding. For sake of simplicity, 
we focus on Ankan's original construction in this paper; the 
generalization to polar codes based on different kernels (as 
considered, e.g., in (HI) is straightforward. Furthermore, we 
restrict our considerations to the SC decoding algorithm as 
in Hi; though, our results regarding the code construction are 
also valid for other (better performing) decoders that are based 
on the SC algorithm, as, e.g., list decoding |9|. 

A. Code Construction 

Let B : {0,1} -> y be a B-DMC and 1(B) its sym- 
metric capacity, i.e., the mutual information of B assuming 
equiprobable binary input symbols. The encoding operation for 
a polar code of length N may be described by multiplication 
of a binary length- N vector u - containing the information 
symbols as well as some symbols with fixed values (so-called 
frozen symbols) that do not carry any information - with a 
generator matrix G n that is defined by the recursive relation 

"1 0" 



Gn — BmF 



N" N 



' 2N 



N 



1 1 



(21) 
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where TV is a power of two and ® again denotes the Kronecker 
product. B m denotes the [N, N) bit-reversal permutation 
matrix [1|. Encoding takes place in the binary field Fa. The 
resulting codeword c = uGn is then transmitted in N time 
steps over the channel B. 

The code construction is based on a channel combining and 
channel splitting operation [1| that may be represented as a 
linear 2-SBP 

tt: B 2 ^{B?),BW} (22) 



that partitions the vector channel B 2 , i.e., two independent and 
identical instances of B, into two bit channels 

b[°) : {o, 1} -> y 2 



B« :{0,1}^^ 2 x{0,1} 



The labeling rule is given by 



r 



[uo, ui] G {0, 1} u ■ Gi S {0, l} 2 



(24) 



Since the average capacity per binary symbol does not change 
under an SBP, we denote the mean value of the bit channel 
capacities of 7r by /(B) instead of M n (B) in the following. 

The construction of a polar code of length N = 2™ may be 
equivalently represented by the n-fold product concatenation 
of 7r as defined in the preceding section. This follows easily 
from by comparison of the corresponding permutation 
matrices. The resulting SBP tt'' 1 partitions the vector channel 

gJV 

n-:B N ^{B^,...^-^} (25) 



into N bit channels 



b$> : {o, 1} -> y x {o, iy 

(0 < i < N) with symmetric capacities 
J(B$) : = I(Ui; Y ,..., Y N -i\U , 



(26) 



(27) 



Therefore, the transmission of each source symbol Ui can 
be described by its own bit channel B*?„. The output of each 
channel o£l depends on the values of the symbols of lower 
indices uo, ■ ■ ■ Wj-i- Thus, the channels B^l imply a specific 
decoding order. 

For data transmission only the bit channels with highest 
capacity are used, referred to as information channels. The 
data transmitted over the remaining bit channels (so-called 
frozen channels) are fixed values known to the decoder. By 
this means, the code rate can be chosen in very small steps 
of 1/N without the need for changing the code construction 
- a property especially useful for polar-coded modulation (cf., 
SeaED-B). 

In order to select the optimal set of frozen channels, the 
values of the capacities I(B^l) are required. These can either 
be obtained by simulation or by density evolution iflOl . 

B. Successive Decoding 

Upon receiving a vector y - being a noisy version of the 
codeword c resulting from transmission over the channel B - 
the information bits ui can be estimated successively for i — 
0, . . . , N — 1. Here, information combining ifTTI of reliability 



values obtained from the channel output y is performed instead 
of F2 arithmetics as in the encoding process. 

The successive cancellation (SC) decoding algorithm |T) for 
polar codes generates estimates on the information symbols 
iii (transmitted over B^l) one after another, making use of 
the already decoded symbols uo,...,iij_i. We denote the 
probability that an erroneous decision is made at index i given 
the previous decisions have been correct, by p c (B^l). Thus, 
the word error rate for SC decoding (WERsc) is given by 



WERsc = 1- J] (l-P»(B$)) 



(28) 



ieA 



(23) where A denotes the set of indices of the information channels. 



i(0 



C. Variance of the Bit Channel Capacities 

With increasing block length, the set of bit channels B^ 
shows a polarization effect in the sense that the capacity 
I(B^l) of almost each bit channel is either near or near 
1. The fraction of bit channels not being either completely 
noisy or completely noiseless tends to zero 0]. 

In the following, we show that this polarization effect may 
be represented by the sequence of variances of the respective 
polar codes' bit channel capacities for increasing block length. 
The variance of the bit channel capacities of a length-iV polar 
code around their mean value /(B) is given by 

N-l 



^-( BAr ) = ^E / ( B -) 2 - / ( B ) 5 



(29) 



Using ( TT9l >. we notice that the sequence of variances increases 
monotonously as the block length gets larger, i.e., 

V^(B 2N )>V^(B N ) . (30) 

Furthermore, from (|9} the sequence {V^ (B Ar )} nG N is upper- 
bounded by 

K»(B*) </(B)(l-/(B)) (31) 

for all n G N. According to (0, this maximum variance 
can be only achieved iff all bit channel capacities I(B^l) 
are either or 1, which obviously corresponds to the state 
of perfect polarization. As shown by Arikan JT], the latter is 
asymptotically approached while the block length N goes to 
infinity; therefore, we have 



lim V«* (B N ) = 1(B)- (1- 1(B)) 



(32) 



Although we have not yet been able to establish an explicit 
relation between bit channel capacity variance and code error 
performance, one would intuitively expect that increasing the 
variance by a careful code design should correspond to a 
sharper polarization of the bit channels and therefore should 
lead to better performing polar codes in terms of word error 
rate or bit error rate. 

Fig. [TJ depicts the variance of the bit channel capacities 
for polar codes of various block lengths constructed over the 
BEC channel as a function of its capacity. The converging 
behaviour for increasing block length N towards the maximum 
achievable variance (black line) can clearly be observed. 
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Fig. 1. Bit channel variance for polar codes over a BEC channel Bbec. 
block length N = 2 n , n = 1,2,3,8,12,20. Black: upper bound on the 
variance. 

IV. Multilevel Polar Coding 

We now consider the conventional discrete-time equivalent 
system model of M-ary digital pulse-amplitude modulation 
(PAM) - M — 2 m being a power of 2 - with signal constella- 
tions of real-valued signal points (ASK) or of complex-valued 
signal points (PSK, QAM etc.) over a memoryless channel W, 
e.g., the AWGN channel. 

From an information-theoretic point of view, an optimum 
combination of binary coding and M-ary modulation follows 
the multilevel coding (MLC) principle 0, (6). 

A. Multilevel Coding 

In the MLC approach, the M-ary channel W is partitioned 
into m bit channels (also called bit levels) by means of an 
m-SBP ... . . 

A:W->{B$ )) ,...,B^ m - 1) }. (33) 

The mapping from binary labels to amplitude coefficients is 
specified by the labeling rule C\. 

Channel coding is implemented in the MLC setup by using 
binary component codes J6) for each of the bit levels B A 
individually with correspondingly chosen code rates R4. The 
overall rate (bits per transmission symbol) is given as the 
sum R = Yli^o T ne recerver then performs multi-stage 
decoding (MSD), i.e., it computes reliability information for 
decoding of the first bit level which are passed to the decoder 
of the first component code. The decoding results are used for 
demapping and decoding of the next bit level, and so on. 

The mutual information between the channel input and 
channel output of W assuming equiprobable source symbols is 
also referred to as the coded modulation J6], or constellation- 
constraint, capacity C cm (W). It is related to the average 
capacity per binary symbol ((6) of W by 

m — 1 

C7 cm (W) := I(X; Y) = £ /(B«) = m ■ M A (W) . (34) 

i=0 

Since A is an SBP, the coded modulation capacity does not 
depend on the specific labeling rule C\. 

A potential drawback of the MLC approach for practical 
use lies in the necessity for using several (relatively short) 



component codes with varying code rates for the particular bit 
levels. According to the capacity rule [6|, the code rate Ri for 
the i-th level should match the bit level capacity Ri = I( B^). 
Since these capacities vary significantly for the different levels, 
for MLC channel codes are preferred, that allow for a very 
flexible choice of the code rate. 

B. Multilevel Polar Coding 

A multilevel polar code of length mN, i.e., a multilevel 
code with length- N component polar codes over an M-ary 
constellation, is obtained by the order-mA" concatenation of 
the m-SBP A of MLC and the A-SBP tt™ of the polar code: 

A®*»:V^{B^,...,B<£^- 1 >} (35) 

as defined in (1141 1. The encoding process for this multilevel 
polar code is described by the generator matrix 

P m ,N ■ (G N ® I m ) (36) 

with P m .N as in (l20i i. followed by labeling and mapping to 
the N transmit symbols as defined by A. Here, I m denotes 
the (m, m) identity matrix. 

We remark that the selection of frozen channels - and thus, 
the rate allocation - is done in exactly the same way as 
for a usual binary polar code by determining the symmetric 
capacities I(B^) (0 < i < mN) and choosing the most 
reliable bit channels for data transmission. Therefore, the 
explicit application of a rate allocation rule to the particular 
component codes - like considered in the original MLC 
approach — is not needed in case of multilevel polar codes. 
However, it has been shown lfl2l that the rate allocations 
obtained by this method basically equal those obtained from 
the capacity rule. 

According to (1 191 , the variance of the bit channels of a 
multilevel polar code with length- N component codes is given 
by 

3 m-l 

T^W(W) =y A (W) + - ^"( B a ;) ) • (37) 

Thus, the SBP A - that represents the modulation step - may 
be seen as the first polarization step of a multilevel polar code. 
From this representation, it is clear that A should be chosen 
such that it maximizes the term d37l >. 

In this approach, both binary coding and 2 m -ary modulation 
are represented in a unified form as a sequential binary channel 
partition of the vector channel W". Both should be designed 
according to the polarization principle, i.e. the maximization of 
the variance of the bit channel capacities d37l i under successive 
cancellation - or, equivalently, multi-stage - decoding by 
careful choice of the labeling rule. 

C. Influence of the Labeling Rule 

Here, we focus on the set-partitioning labeling approach 
(corresponding to Asp) by Ungerboeck [13] and Gray labeling 
Aq that aims to generate bit levels that are as independent as 
possible fl4l . 

Fig. [2] depicts the variance of the bit levels for ASK 
modulation using both SP and (binary-reflected) Gray labeling. 
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Fig. 2. Bit level variance for 2 m -ary ASK signalling with multi-stage 
decoding over the AWGN channel (m = 2, 4, 8). Red: SP labeling, gray: 
Gray labeling. 

It can be observed that - except for small capacities M\(\N) 

- the SP labeling approach leads to significantly larger bit 
level variances compared to the Gray labeling, as expected. 
Therefore, for multilevel polar codes, SP labeling should 
be preferably applied. Furthermore, when compared to the 
corresponding variance curves of polar codes over a single 
B-DMC for N = 2, 4, 8 as shown in Fig. Q] especially in case 
of SP labeling the achieved bit level variance is significantly 
higher, emphasizing the importance of the careful choice of 
the labeling C\ in this first step of polarization for multilevel 
polar codes. 

V. Simulation Results 

We finally present some numerical results in terms of rate- 
vs. -power-efficiency plots in order to illustrate the error per- 
formance of polar-coded modulation with SC decoding over 
the AWGN channel. 

Besides common Monte-Carlo simulations, we also present 
approximate results obtained by density evolution (DE) [10|. 
Here, for multilevel polar codes, we numerically determine 
the bit channel capacities J(B^) of the transform A as 
defined in (I331 I, Then, we calculate the bit channel capacities 

- and the corresponding error probabilities Pei^xl^) - of 
the component polar codes by performing density evolution 
with the well-known Gaussian approximation [15|, i.e., we 
simply assume the output bit channels of each SBP in the 
chain A ® it ® . . . <g> tt to be Gaussian. The word error rate 
under SC decoding WERgc is obtained from (l28l i. 

Fig. [3] depicts the performance of multilevel polar codes 
with 16-ASK modulation under SC decoding for different 
labelings C\ and various block lengths. The large performance 
loss of Gray labeling w.r.t. SP labeling can clearly be observed. 
Furthermore, by comparison of the results obtained by DE and 
the simulation points, the inaccuracy induced by the Gaussian 
assumption can be obviously neglected. 

VI. Conclusions 

In this paper, we have combined polar coding and multilevel 
coding by representing both as sequential binary partitions 
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Fig. 3. 16-ASK / AWGN: Rate vs. power efficiency of multilevel polar 
codes using SP (blue) and Gray labeling (dashed gray). Markers: Simulation 
points for mN = 512, lines: DE results with overall block length (from right 
to left) mN = 2 fc , k = 9, 11, 13, 15. Bold blue: coded-modulation capacity, 
dashed black: Shannon bound for real constellations. 

(SBPs). Based on this representation, we have derived rules 
for optimization of multilevel polar codes. 

Future work will extend this novel framework of channel 
partitions to bit-interleaved coded modulation (BICM) and 
incorporate fading scenarios. 
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